-
Notifications
You must be signed in to change notification settings - Fork 240
cuda.core.system: More device-related APIs #1465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds device-related APIs to cuda.core.system, focusing on InfoROM inspection and detailed PCI topology information.
Changes:
- Added new InfoROM inspection APIs including version retrieval, configuration checksum, and validation
- Extended PCI information with subsystem ID, class codes, link generation/width details, throughput, and replay counter
- Added device topology and P2P status query capabilities with support for CPU affinity-based device enumeration
- Introduced new device properties: index, module_id, minor_number, board_part_number, addressing_mode, display_mode/active, and repair_status
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| cuda_core/tests/system/test_system_device.py | Added comprehensive tests for new device APIs including PCI info extensions, InfoROM operations, topology queries, and device properties |
| cuda_core/docs/source/api.rst | Updated API documentation to include new classes and functions for device topology, P2P status, InfoROM, and PCI utilities |
| cuda_core/cuda/core/system/_inforom.pxi | New file implementing InforomInfo class with methods for version retrieval, validation, and BBX flush time tracking |
| cuda_core/cuda/core/system/_device.pyx | Extended PciInfo with new properties and methods, added Device properties and topology methods, new module-level topology/P2P functions, and RepairStatus class |
| cuda_bindings/cuda/bindings/_nvml.pyx | Refactored system event APIs to be more Pythonic, fixed device_get_last_bbx_flush_time to return tuple, added SystemEventData_v1 class, and corrected accounting_stats_dtype reserved field |
| cuda_bindings/cuda/bindings/_nvml.pxd | Updated function signatures to match refactored implementations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment has been minimized.
This comment has been minimized.
|
/ok to test |
Co-authored-by: Copilot <[email protected]>
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
cpcloud
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| cpdef unsigned int device_get_inforom_configuration_checksum(intptr_t device) except? 0 | ||
| cpdef device_validate_inforom(intptr_t device) | ||
| cpdef unsigned long device_get_last_bbx_flush_time(intptr_t device, intptr_t timestamp) except? 0 | ||
| cpdef tuple device_get_last_bbx_flush_time(intptr_t device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a weird signature. Does this function really return a tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it does. It returns two numbers.
| assert isinstance(repair_status.tpc_repair_pending, bool) | ||
|
|
||
|
|
||
| @pytest.mark.skipif(helpers.IS_WSL or helpers.IS_WINDOWS, reason="Device attributes not supported on WSL or Windows") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a helpers.IS_WINDOWS_ISH 🤣 (not serious)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bad idea -- it's a pattern that's pretty common. Can do that as part of a separate sweep though (since it would also apply to non-cuda.core.system parts of cuda.core).
|
/ok to test |
|
This is sort of a grab bag of more APIs for
cuda.core.system, mainly related to InfoROM inspection and more detailed information about PCI topology.Checklist